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Abstract: For stationary sequences, under general local and asymptotic dependence restrictions, any 
limiting point process for time normalized upcrossings of high levels is a compound Poisson process, i.e., 
there is a clustering of high upcrossings, where the underlying Poisson points represent cluster positions, 
and the multiplicities correspond to cluster sizes. For such classes of stationary sequences there exists the 
upcrossings index rj, < rj < 1, which is directly related to the extremal index 6, < 6 < 1, for suitable 
high levels. In this paper we consider the problem of estimating the upcrossings index r] for a class of 
stationary sequences satisfying a mild oscillation restriction. For the proposed estimator, properties such 
as consistency and asymptotic normality are studied. Finally, the performance of the estimator is assessed 
through simulation studies for autoregressive processes and case studies in the fields of environment and 
finance. 
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1 Introduction and preliminary results 

Let X = {Xn}n>i be a stationary sequence and u = {un}n>i a sequence of real levels. We say that 
X has an upcrossing of n„, at i, if the event {Xi < Un < Xj+i}, i = 1, . . . ,n, occurs. The point process 
of upcrossings of Un by the first n variables of X is then defined by 

n-l 

N,,{B) = lI{X,<«„<x,+J<5^(i?), B C [0, 1], n>l, (1.1) 

i=l 

where dai') denotes unit mass at a G B measure and lyi the indicator of event A. 

Ferreira [3] showed that if X satisfies the mixing A(u) condition, introduced in Hsing et al. 12], 
and Nn converges in distribution (as a point process on [0,1]), then the limit is necessarily a compound 
Poisson process. For independent and identically distributed (i.i.d.) sequences it is easy to see that Nn 
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converges in distribution if and only if ti„ = u"n^ for some > 0, where are normalized levels for 
upcrossings, that is, u*^*^^ = n>\ denotes a sequence that satisfies 

lim nP(Xi < uf^ < X2) = ly, (1.2) 

being the limit point process a Poisson process on [0,1] with intensity u. We remark that when X is a 
stationary sequence, satisfying the long range dependence condition D{u.^'^^) of Leadbetter [12j and the 
mild oscillation restriction D"{u^'^^) of Leadbetter and Nandagopalan [13], Nn converges in distribution 
to the same Poisson process as in the i.i.d. case (Leadbetter and Nandagopalan [l3]). When X only 
satisfies condition A(u('^)) the limit compound Poisson process N is in fact a direct consequence of the 
clustering of high level upcrossings in a dependent sequence. Its intensity is simply rju, where rj is the 
upcrossings index, and has multiplicity distribution 7r(j) = liuin^^oo T^n{j), j = 1) 2, . . . , with 

]I{x,<«„<x,+i} =j\Yl 2{x,<«„<x,+i} > , i = 1, 2, . . . (1.3) 

i=l i=l ) 

for some sequence r„ = \nlkr\ and {kn^n>\ a sequence of positive integers satisfying 

fc„ > +00, ^ > 0, /c„a„.i„ > 0, (1.4) 

n— ^+00 n n— >+oo ' n— ^+00 

where are the mixing coefficients of the A(u) condition. 

The upcrossings of Un by Xi with i G J„j- = {(j — l)r„ + 1, . . . , irn}, for some j = 1, . . . , are 
regarded as forming a cluster of upcrossings and TTnU), J = 1, 2, ... is called the conditional cluster size 
distribution, since it is simply the distribution of the number of upcrossings in a cluster (i.e. in a set 
Jn.j) given that there is at least one. 

The upcrossings index can then be viewed as a measure of clustering of upcrossings of high levels m„ 
by the variables in X and is formally defined as follows. 

Definition 1.1 (Ferreira [4]) If for each u > there exists u^"^) = {un^}n>i and 
lim P(A^^'^'*([0, 1]) = 0) = e"^'^, for some constant < 7] < 1, then we say that the sequence X has 

n—^+oo 

upcrossings index rj. 

Many common cases, as i.i.d. sequences or sequences that satisfy condition D"{u^'^^), have upcrossings 
index rj = 1. A value r/ < 1 indicates clustering of upcrossings of u^''^, giving rise to multiplicities in the 
limit. 

If for each v > there exists Un^ satisfying ()1.2p and lim„_^+oo -P(^i > Un^) = r for some r > 
then the upcrossings index rj exists if and only if the extremal index 6 exists and, in this case, 

6 = -ri. (1.5) 
r 

(Ferreira |1]). Note that the previous conditions imply that 9 < rj since if a level Un is simultaneously 
normalized for upcrossings and for exceedances we necessarily have u < t. 

The conditional cluster size distribution ??„ defined in ()1.3p is also related to the upcrossings index r] 
in the following interesting manner, which is a direct consequence of Lemma 2.1 of Ferreira [1]. 
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Proposition 1.1 Suppose X satisfies A(u('^)), u > 0, has upcrossings index r/ > and let Tin he defined 
as in lll.3\) with Un = Un^ . Then 

lim y]j7fn(j) = -• 

The reciprocal of the upcrossings index can then be interpreted as the limiting mean cluster size 
of upcrossings. Hence, it is natural to estimate rj as the reciprocal of the sample average cluster size. 
However, such an estimator suffers from the drawback that the identification of clusters (equivalently, 
choice of kn) depends on the knowledge of the mixing coefficients an,i„- Alternative ways of identifying 
clusters of high level upcrossings is therefore a key issue for the estimation of rj. 

If a sequence is suitably well-behaved then one might hope that groups of successive upcrossings of 
Un are sufficiently far apart that each group can be regarded as a separate cluster. A sufficient condition 
for this to hold is condition D^^\u) of Ferreira [Ij. 

Definition 1.2 (Ferreira [4]) Let X 6e a sequence satisfying the condition A(u). X satisfies condition 
5(3) (u) if 

lim nP{Xi <Un<X2, iV3,3 = 0, iV4,,„ > 0) = 0, 



n— S-+00 



for some sequence rn = [n/kn] with {kn}n>i satisfying il-4\ ) and Ntj = Nn{[i/n, j /n]) with Nij = if 
j < i. 

Remark 1 Condition D^^\\i) is implied by condition 

rn-l ^ 
lim n V P{Xi < u„, < X2, iVs.s = 0, Xj < m„ < = 0, 

j=A 

which is clearly implied by condition 



lim n V P(Xi <Un< X2, , X- < u„ < X,+i) = 0. (1.6) 



J=4 



Remark 2 Sequences that satisfy condition D"(u) also satisfy condition D^^\u) and the latter belongs 
to a wide family of local dependence conditions D^^\u), k > 2, introduced in Ferreira 141- This family of 
local dependence conditions is slightly stronger than the family of conditions D^^\u), k > 1, considered 
in Chernick et al. f^. The negatively correlated uniform AR(1) process given in Chernick et al. |^ is 
an example of a process that verifies the previous condition as shown in Sebastido et al. 118^ . In Ferreira 
^ we find an example of a max-autoregressive process for which D^^\u) also holds. 

Remark 3 Necessary and sufficient conditions for U.6\) to hold, involving upcrossings-tail dependence 
coefficients, can he found in Ferreira and Ferreira J^. 

Condition D^^\\\) locally restricts the dependence of the sequence, but still allows clustering of 
upcrossings. It roughly states that whenever an upcrossing of a high level occurs, a cluster of upcrossings 
may follow it, but once the sequence falls below the threshold it is very unlikely to upcross it again in 
the nearby observations. Thus, it enables the identification of upcrossings clusters since for this class 



3 



upcrossings may be simply identified asymptotically as runs of consecutive upcrossings and the cluster 
sizes as run lengths. Indeed, Ferreira [5] proved that if the conditional upcrossing run length distribution 
is defined as 

nlij) = P(X3 < n„ < X4, . . . , X2j+l <Un< X2j+2, ^2j+3,2j+3 = | Nl,l = 0, X3 < U„ < X4) , j > 1, 

(1.7) 

then the following result gives the conditional expected length of an upcrossing run. 

Proposition 1.2 (Ferreira [5j) //, for each > 0, X satisfies condition D^^\u^'^^) and P{X3 < Un^ < 
X4, . . . ,X2j-i < Un^ < X2j) — > 0, as j — >• +00, then the upcrossings index ofK exists and is equal to rj 
if and only if, 



P{Xi < uj ^^ < X2) 1 



P{Xi<ul:^ <X2,N3,3 = 0) ^^+^ V 
for each v > {). 

From this result it seems natural to estimate r/ by the reciprocal of the sample average run length. 
We shall consider such an estimator in Section 3. 

Remark 4 Under condition D^^\\i) the upcrossings index can be related to other dependence measures, 
namely the lag-1 upcrossings tail dependence coefficient fii, since ry = l — /ii = l — linix^xp PiX-s < 
Un ^ < X4 I Xi < viiC' < X2), where xp is the upper limit of the common marginal distribution F of X., 
as shown in Ferreira and Ferreira f^. 

Alternative expressions for rj involving only stationarity are given in the following simple lemma. 

Lemma 1.1 If nP{Xi < Un < X2,N3^3 = 0) > ^ > then the following are equivalent: 

i) P(iV3,3 = I Xi < u„ < X2) — ^ 7?; 



n—^+oo 



ii) nP{Xi <un< X2) > J; 

n— >-+oo ' 

iii) n(l - P(iVi,i = 0, iV3,3 = 0)) > e + J. 

Remark 5 From the stationarity of X we obviously have P{Xi < Un < X2, N^^^ = 0) = P{Ni^i = 
0, X^ < Un < X/j^) and consequently an upcrossings run can be either identified at its beginning or at its 
end. 

The upcrossings index estimation is important not only by itself, as a measure of clustering of up- 
crossings of high levels, but also because of its relation with other dependence coefficients, namely the 
extremal index and the lag-1 upcrossings tail coefficient (Ferreira and Ferreira [6]). Therefore, we shall 
pursuit this issue in the remainder of this paper. Note that under the conditions of Proposition 1.2 it is 
reasonable to estimate rj by the ratio between the number of upcrossings followed by a non-upcrossing 
and the number of upcrossings of a high level. 

In Section 2, we formally present such an estimator, suggested in Ferreira [5], which we shall call 
the runs estimator of the upcrossings. In the subsequent sections we show that such an estimator is 
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typically weakly consistent and asymptotically normal. In Section 6 we carry out a simulation study of 
finite sample behaviour of the proposed estimator in a max-autoregressive process and in a first order 
autoregressive process. In Section 7 the performance of the estimator is also assessed through case studies 
in the fields of environment and finance. Conclusions are found in Section 8. 



2 The runs estimator of the upcrossings index 

Before formally defining the runs estimator of the upcrossings index, we shall start by introducing 
some notation that will be used throughout the paper. 

Let Nn denote the point process of non-upcrossings of n„, followed by an upcrossing, by the first n 
variables of X, that is 

n-3 

^B) = J2 2{^.,=o, x.,.<«„<x.,3}^-(^)' B C [0, 1], n > 1. (2.1) 

i=l 

For a sequence of real levels {n„}„>i lets define the random variables Yi = Yi{un), i = 1, . . . ,n, 
corresponding to the number of consecutive upcrossings of the level Un occurring from instant i + 2 on, 
that is, 



Yi 



if Ni+2,i+2 = 

k if < Un < Xj_|_4 < Un < Xi^^, . . . , Xj_(_2/c < ""n < -'^i+2fc+l) Ni+2k+2,i+2k+2 = 0, 



with k > 1. Furthermore, lets denote by Zi{un), i = 1, . . . , n, the length of each of these sequences given 
the occurrence of a non-upcrossing followed by an upcrossing at instant i + 2, of level which has 
distribution vf*(-) given in ()1.3p since 

^*ik) = P{Z,{Un) = k)= P{Yi = k I TV,,, = 0, Xi+2 <Un< ^,+3), k > 1, (2.2) 

independent of i from the stationarity of X. 

If X has upcrossings index 7] > and the conditions of Proposition 1.2 hold then for u„ = Un"^ 

- = lim E[Zi{un)] = lim E[Yi \ iVi,i = 0,^3 < tx„ < X4] 

Tj n— >+oo n— >+oo 

PiXi<Un<X2) E[Nni[0,l])] 

= iim — = iim 



n- 



P(iVi,l =0, X3<Un< X4) "^+- E[Nn{[0, 1])] 

From this result it is natural to propose the non parametric estimator for rj given by the ratio between 
the total number of non-uprcrossings followed by an upcrossings and the total number of upcrossings 

^n— 3 

rjn = uu) ■■= = ^^^i, "rr"' ^"""""^""" ^ (2-3) 



where u is a suitable threshold. 

We shall call this estimator the runs estimator of the upcrossings index attending to its similarity 
with the runs estimator of the extremal index proposed by Nandagopalan [15] for stationary sequences 
satisfying condition D"{u), as suggested in Ferreira In practical applications we will always use this 
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estimator (j2.3p . nevertheless, the theoretical properties presented in the following sections will be proved 
for an estimator that is asymptotically equivalent to fjn and that we define in what follows. 

^ (T) 

Let Nn denote the marked point process on [0, 1] defined by 

n— 3 

nP{B) = E^(^«)2{^..=o, x.,.<u.<x.,,}S^iB), B C [0,1], n > 1, (2.4) 

i=l 

^ (T) 

where T : IN — > IR is a given mapping. Thus, Mk ' has mass equal to TiYi) at the point i/n whenever 
X has a non-upcrossing at i followed by an upcrossing. 

If in (|2.4p we consider T{y) = 1 we obtain the point process Nn in (|2.ip . Whereas, if we consider 
T{y) = y > 1 we obtain a point process, that we shall denote by Nn, which differs from the upcrossings 
point process Nn in (jl.ip on an event with probability bounded by P{Xi < Un < X2) , which converges 
to zero, as n — 7- +00, since the levels Un are commonly chosen to satisfy p.2p . Therefore, we can instead 
consider the estimator rj^ = A''„([0, 1])/A^n([0, 1]) since, for such levels, rjn = + Op{l). The properties 
proved, in what follows, for rj^, then also apply to rjn that we shall use in the simulation studies. 

3 Weak consistency 

We show, in this section, that rj^ is a weakly consistent estimator of rj under mild assumptions. 
Throughout it will be assumed that X is a stationary sequence satisfying condition D^^\u) and has 
upcrossings index t] > 0. 

Note that when the level Un satisfies ()1.2p there are insufficient upcrossings to give statistical "con- 
sistency" for the estimator rf^. That is, as n increases the value of rj^ does not necessarily converge 
appropriately to the value rj. Nevertheless, consistency can be achieved by the use of somewhat lower 
levels. We shall therefore consider non-normalized levels, in the sense of (11. 2p . u„ = u\'^] for some 
fixed u > 0, that satisfy 

nP{Xi <Vn< X2) - CnV > (3.1) 

where {cn}n>i and {kn}n>i are sequences of real levels such that c„, fe„ > +co and Cn/kn > 0. 

71— >+oo n— >+oo 

Note that for this sequence of levels we also have lim„_^.4_oo E[Zi{vn)] = l/i]. 

With the same arguments used in Nandagopalan [TS] we obtain the following lemma which is essential 
to obtain the properties of the estimator rjn that we further present. 

Lemma 3.1 Suppose {kn}n>i is a sequence of integers such that kn > +00 and that there exists a 

~ n—>--{-co 

sequence {ln}n>i for which 

kn[an,u-2 + P{Nni[0, In/n]) > 0)] > 0. (3.2) 

Then ^ 

n— >-+oo 
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for any sequence of real numbers {a„}n>i, where J„ C [0, 1], n > 1, is a sequence of intervals such that 



for each n, Jn D Uj=i Jnj, with Jnj, j = I, . . . ,kn disjoint subintervals satisfying m{Jn) — ""^dJjlLi Jnj) < 
kn/n (m(-) denoting Lebesgue measure). 

Remark 6 Condition Ii3. 2\) is satisfied for any sequence of normalized levels u^'^^ , as in il.2\) . if condition 



A(u(^)) holds, since this coTiditioji ijriplies that hfi(y^ 
0) > 0. 



and, for levels u^"'^, /c„P(A^„([0, /„/n]) > 



n— >+cxi 



Henceforth, we shall write Nn = Nn{[0, 1]) = Y17=i ^i'^{Nij=o, Xi+2<un<Xi+3} and Nn = Nn{[0, 1]) = 
Y17=i ^{Nii=o, Xi+2<u„<Xi+i} so that Nn and A^^^ will now denote random variables rather than point 
processes, and accordingly Nr„ = A^n([0, and N.^^ = Nn{[0,rn/n]). 

Lets consider the compound Poisson random variable 

N:=Y,z*{vn) 

where A^* denotes a Poisson random variable with mean Cni]!^ and the random variables Z*{vn) are 
independent and identically distributed with the same distribution as Zi{vn), that is, distribution vr* 
given in (12. 2p . 

In the following results we prove, with similar arguments to the ones used in Nandagopalan [15], that 
under suitable conditions the limiting distributions of Nn and Nn are identical to, respectively, those of 
KandiV*. 

Theorem 3.1 Let {vn}n>i be a sequence of levels satisfying 113.1]) and suppose there exists a sequence 
{ln}n>i for which \3.i3(l holds for such levels. 
If 



E 



0(1) 



uniformly in j = 1, . . . , r„, (V„ = [n/A;„] then 

E [exp(itc~^iV„)] - E expiitCn-^Nn] 

and 



n— >+oo 







E 



ex-p{itCn^ Nn) —E exp{itCn^N*^ 



(3.3) 



(3.4) 



(3.5) 



for each t G H. 



Proof: To prove (13. 4p . let ^Zi{t) = E[exp{itZi)] denote the characteristic function of the random 
variable Zi with distribution vf^(-) given in (|2.2|) . Since A^* is a compound Poisson variable we have 

E [exp(ite-^iV;)] = exp{-CnM^ - ^^zAtCn^))) 

and hence it suffices to show that 

E [exp{itCn^Nn)] = eM-CnM^ - ^2i(*c~^))) + o(l). (3.6) 
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Nevertheless, from Lemma 3.1, we can write 

E [exp{itc-^Nn)] = [E [exviitc-'Nr^)\f- + o(l). 
Now following the steps of the proof of Proposition 5.3.1 of Nandagopalan [T^ we obtain 

E [exp(ite-ilV,J] = 1 - -^(r/i/ + o(l)) x (1 - ^ zA^Cn^)) + o{K'), 



k 



since (13. 3p holds and ^knP{Ni^i = 0, X3 < Vn < X4) = r/z^ + o(l) as a consequence of condition D^^\vn) 
Moreover, since |1 - ^'zi(tc-i)| < \tc-^\E[Zi{vn)] and E[Zi{vn)] > V'^ we have 

n— >+oo 

E [exp(ite-^iV„)] = (^1 - ^rjuil - ^z^{tc-')) + oiK^)^ " + o(l). 



from which results (13. 6p and thus proves ()3.4p . 

Convergence (j3.5p can be established with similar arguments. □ 



Remark 7 If in the previous result we consider normalized levels u^'^ then both convergence {^^-) and 
\3. 5\) would hold without the need of the normalizing constant Cn, being, in this case, N* a Poisson random 
variable with mean r]v. 

The next result justifies the need to consider lower levels Vn satisfying (|3.ip in order to guarantee the 
consistency of the estimator. 

Theorem 3.2 Suppose {vn}n>i is a sequence of levels satisfying 113. 

E 



and 

E 

then 



Z*Avn)^iztM>c„}\ (3.7) 



iZAvn))%znvn)<c^}\ ^^^^^ (3-8) 



^ ^ p 

Cn — ^ U. 

n— >+oo 

Proof: Start by noting that convergence (13. 7p along with the fact that E[Zi{vn)] > r]~^ implies 

n— ^+00 

that r]L'E[Zl{vn)^{z*{vn)<cn}] ~ — ^ ^- then sufficient to show that 

c-'K - v'^E[zUvn)'^^{ztM<c,.}] 0. 

Now evoking similar arguments to the ones used by Nandagopalan |15j (Proposition 5.3.2) and con- 
sidering 7„ = E[Zl{vn)'i^{zi(v„)<c„}], we obtain, for any e > and 5 < e/max{7„}, 

P{\c-'K-V^ln\>2e) 

E\iZUVn)'S^iZ''(v )<c \?] 

< (5 + r,u)E[Znvn)n{zn^.)^c.}] + c„(r?. -J) +T^/y+H)^ + " > ^'-^^ 

The first two terms tend to zero by ()3.7p and ()3.8p respectively, while the third term tends to zero 

since N* is a compound Poisson random variable with mean CnTju. Hence, since e > is arbitrary, we 
^ p 

obtain c~ — rjwjn > as required. □ 

n— f +00 
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We proved in Theorem 3.1 that c^^Nn and c^^^Nn have the same asymptotic distribution as, respec- 



tively, and c^^ N*, and in Theorem 3.2 proved that c„ N^^ 



-7- u. Moreover, since A^* is a 



Poisson random variable with mean c„t/z^ it follows that c^^N^ 
is now an immediate consequence, summarized in the next result. 



> rjiy. The weak consistency of fj^ 



Corollary 3.1 If the conditions of Theorems 3.1 and 3.2 hold then 

p 



> r]. 



4 Asymptotic Normality 

Imposing additional conditions on the limiting behaviour of the first and second moments of the 
variable Zi{vn) we obtain in this section the asymptotic normality of our estimator rj^. 

The proof of the next result, which is essential in obtaining what follows, shall be omitted since it is 
similar to the proof of Proposition 5.4.1 of Nandagopalan [15j . 

Theorem 4.1 Suppose that {vn}n>i is a sequence of levels satisfying 113.1]) and suppose there exists a 
sequence {ln}n>i for which 113. 2\) holds for such levels. 
If 

E[{Z,{Vn)?^^fi^^^,^]=0{l) (4.1) 

uniformly in j = 1, . . . , r„, (r„ = [n/k^) and 

al = E[{Z,{vn)f] 



is a hounded sequence, then c„ 



Nn - E[Nn] 

does and, in this case, the limits coincide. 



converges in distribution if and only if c^^ 



K - m: 

N* - E[N* 



Theorem 4.2 Suppose that {vn}n>i is a sequence of levels satisfying 113. 1\) and suppose there exists a 
sequence {ln}n>i for which ^3. 2\) holds for such levels. 
If 



al = E[{Z^{vn)f 



C7^ < +00 



and for each e > 0, 



Then 



E[{Zl{Vn))H{^Zliv„)).^ec^^] 



0. 



(4.2) 



(4.3) 



-1/2 



< 


- E[Nl] ' 






" " 


1 






-Em . 


n— >+oo 









V VT] 
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Proof: Since A/"* is a compound Poisson random variable it holds E[N*^] = E[N*]E[Zi{vn)] and we can 
write 

K - E[K] = (K - KE[Zi{vn)]) + E[Z,M]{K - E[N:]). (4.4) 
From Lindeberg's condition (|4.3p the central limit theorem holds, this is, 



c-'/\K-N*E[Z,iv^)]) 

and therefore 

E[exp{itc-'/\Nl - N*E[Z,{vnm] 



d ,r I ^ I 2 1 



n— >+oo 



exp I ^ ( cr^ - 4t ] 



Also, Cn^''^{N* - ^[iV*]) — ^ A^(0,r/i/), so that 

n— >+oo 

E[eMisc-^/\N: - E[N:])}] = exp(-c„7?Kl " e'^''"'" + isc-'/')) — ^ exp ^ "^"'^ 



71— > + 00 



Now 



E[exp{itc-'/\Nl - N*E[Z,ivn)]) + isc-'^^K - E[N*])}] 



exp(-c„r?z^(l - ^'zi(te-i/2)g-itc„^/2£;[ZiK)]gi 



-1/2 



+ isc' 



(4.5) 



with ^'zi(te; 
^ and ^ 
Thus. 



-1/2^ 



l + itcJ'^E[Zi{vn)]+ p-n^ and e-**^"'''-^^[^iK)l = I - itcn^''^ E[Z i{vn)] + pn^ where 



.(1) 



-1/2 



1/2: 



.(2) 



Pn'^ and are bounded by 



:„[(! - ^'zi(te;:i/2)e-^*'="'^'^[^i (''")]) + (1 - e^^^"''' + isc;;i/2) + o(c;i)] 



1/2 



n— >-foo 2 2 

This along with (j4.5p implies that 



-1/2 



<-iV*i5;[Zi(t;„)] 





' " 




n-^+oo \ 








VT] 

^ l/rj yields the result. 



which together with (j4.4|) and the fact that E[Zi{vn)] — 

n— oo 

The asymptotic normality of rj^ is now an immediate consequence of the previous results. 
Corollary 4.1 If the conditions of Theorems 4-1 <ind 4-2 hold then 



□ 



^iV*n - rin) -4-^ ( 0, J^(7?V2 - 1) 



where /?„ = E[Nn]/E[Nn] 
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_____ ^ 

Proof: Conditions (j4.1p - (|4.3p imply the conditions of Theorems 3.1 and 3.2, hence A^^i — 

> V. On 

n— )-+oo 

the other hand condition D'^'^\vn) implies that > -q. The result now follows from the fact that 

n— )-+oo 

Cn Nn 

□ 

Remark 8 Since the variance of rj^ is of order 1/z/, if we kept v fixed as in il.!^) we could not guarantee 
the consistency ofrj^. Hence the need to assume that u = Vn ^ +c« as n ^ +00. 

To conclude we present a result that enables one to construct approximate confidence intervals or a 
hypothesis test regarding r/, i.e., to determine the extent of clustering of upcrossings of high levels in the 
observed data. From a practical viewpoint it is more useful than Corollary 4.1. 

Corollary 4.2 Suppose that the conditions of Corollary 4-1 hold, 

Vc^iVn - r]) > 

n— >+oo 

and 

C-'E[iZliVn))%iznv.))2^c„}] — 0. 

Then 



Where al = V^-^. ^.+2<^n<x,^3} _ 

Proof: Straightforward from Corollary 4.1 and the fact that Theorems 3.1 and 3.2 imply that 

n— 3 



=1 

□ 



We recall that the properties proved for the estimator rj^ remain valid for the estimator rjn used in 
the sequel of this paper. 

5 The choice of the levels 

In Section 3 it was shown that for the runs estimator of the upcrossings index rjn consistency can be 
achieved by the use of somewhat lower levels Vn satisfying ()3.ip . Thus, the precise choice of Vn depends 
on the knowledge of the joint distribution of {Xi,X2), typically unknown. These deterministic levels will 
have to be replaced, in practical situations, by random levels suggested by the relation 

Tl 

— P{Xl <Vn< X2) ~ V. (5.6) 
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This relation basically states that the expected number of upcrossings of the level Vn is approximately c„i^. 
Contrarily to the random levels used in the estimation of the extremal index, these random levels can not 
be represented by an appropriate order statistic. Nevertheless, for levels Vn such that ■^P(Xi > Vn) ~ r 
the expected number of upcrossings will be necessarily smaller or equal to c„t. It then seems natural 
to also replace Vn by the appropriate order statistic used in the estimation of the extremal index (for 
example, Nandagopalan ^15j), namely Vn = ^n-[c„T]:n- With this consideration we obtain the following 
estimator for r] 

NniVn) 
Nn{Vn) 

where iV„(u„) = iV„([0, 1]) and iV„,(u„) = iV'„([0, 1]) with n„ = u„. 

The weak consistency of this estimator can be obtained by showing that it is closely approximated 
by the corresponding estimates based on the non-random levels v„. For this, lets start by noting that for 
two levels vi and V2, such that vi ^ V2, and fixed n we have 

\Nn{vi) - Nn{v2)\ < \NrM - Nn{v2)\ (5.7) 

and 

\Nn{vi) - Nn{v2)\ < \Nn{vi) - Nn{v2)\ (5.8) 

where A^„(v„) = Y17=i 2{x,>t,„}- 

Theorem 5.1 Suppose that for each u > there exists u^'^l , = ui'^l , for some r > 0, the conditions of 

■' [n/c„] [n/cn] •' ' 

Proposition 5.3.1 and 5.3.2 in Nandagopalan 115] hold for each r' in a neighbourhood of t and conditions 
(3.3), (3.7) and (3.8) hold for each v' in a neighbourhood of v. Then rjn > rj. 

n~->-+co 

Proof: For e > 

Pi\c~'Nnivn) -i^\> 6e) 

< Pi\c-\cnT] - Nnivn)\ > 3e) + P(|c-iiV„(t;„) -u\> 3e) (5.9) 

by (15. 7p . with vi and V2 replaced by Vn and Vn = U[n/cn]{'^ + £)• If e is sufficiently small, then it follows 
from the results in [15] and in Section 3 that c~'^Nn{vn) — - — > t + e and c~^Nn{vn) — - — > u + e. 
Therefore, since c~^[c„r] > r, for large n (15. 9p is dominated by 

n— )-+oo 

P{\c;,^Nn{vn)-{T + e)\>e) + P{\\c;,^Nn{vn)-{i^ + e)\>e)——-^0, 

n— >+oo 

which proves that c^ Nn{vn) > v. On the other hand, for e > 0, we have from (j5.8p . with vi and 



p 

V2 replaced by Vn and w„ = u^n/cn]{'^ + that 



P{\c-^Nn{vn)-riv\ >6e) 

< P{\c-\cnT] - Nn{vn)\ > 3e) + P(|c-^iV„(i;„) -v\> 3e). (5.10) 



1 p 

Hence, if e is sufficiently small we have c~ Nn{vn) > rji^ + ^) (results of Section 3) and can may 



n— >+oo 



conclude that (|5.10p is dominated by 

P{\c-^Nn{vn)-iT + e)\ > e) + P(| | c-^iV^ (t;„) - r/(i/ + e) | > e) ^ 0, 
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which proves that c^^Nn{vn) > rju. 

The result is now an immediate consequence of the two convergences estabhshed. □ 

The proof of the asymptotic normahty of rjn remains an open problem. 

6 Simulations: Some examples 

This section studies some examples of sequences exhibiting clustering of upcrossings. We illustrate 
the performance of the runs estimator of the upcrossings index with the max-autoregressive (ARMAX) 
process of Ferreira [1] and the negatively correlated first order autoregressive (AR(1)) process of Chernick 
et al. [2], for which the values of the upcrossings index -q are well known. 

6.1 ARMAX process 

Lets consider the ARMAX process of Ferreira [Ij, 

Xn = max{y„, y„_2, Yn-i], n > 1, (6.1) 

where{y„}„>_2 is a a sequence of independent and uniformly distributed on [0, 1] variables. In Figure 1 
we present a sample path of this process. 




50 100 150 200 



Figure 1: Sample path of the stationary ARMAX process in ()6.ip . 

Condition D'^'^\vl) holds for this stationary sequence, with u = u^^^ = {n„, = 1— r/n}„>i, r > 0. It has 
extremal index 6 = 1/3 and upcrossings index 77 = 1/2. Furthermore, it holds 
limn-s.+oo nP{Xi > Un) = 3r and lim,„^_|_oo nP{Xi < Un < X2) = 2t, so the levels n„ are simultaneously 
normalized for exceedances and upcrossings. Note that this implies that for a sample {Xi, . . . , Xn) of 
(j6.1|) with n sufficiently large, the number of upcrossings of a high level is approximately 2/3 of the 
number of exceedances of the same level. 

We shall consider in the following simulations the level u = Xn-k-.n, in (12. 3p corresponding to the 
{k + l)th top order statistics associated to the random sample {Xi, . . . ,Xn) of ()6.ip . commonly used in 
the estimation of the extremal index (see, for example, Nandagopalan [T^] and Gomes, et al. [7]). Hence, 
the upcrossings index estimator is now a function of k, r]n{k). 
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Figure 2: Sample path of the upcrossings index estimator as a function of k for a sample of size n = 5000 
from the stationary ARMAX process in (|6.ip . in a linear scale (left) and in a logarithmic scale (right). 



In Figure 2 we present a sample path of the estimator rjn in (j2.3p as a function of A; > 1, for a sample 
of size n = 5000 in a linear scale and in a logarithmic scale. 

As we can see from Figure 2 the logarithmic scale enhances the performance of rjn for small values of 
k, giving better insight of its stability region around r/ = 0.5. 

For samples of size n = 100, 200, 500, 1000, 2000, 5000 and 10000, from the ARMAX process in (lO) . 
we have performed a multi-sample Monte Carlo simulation with 5000 runs and 10 replicates. For details 
on multi-sample simulation see, for instance. Gomes and Oliveira [8]. We have simulated for the estimator 
fjn{k) in ()2.3p . the mean value (E), the mean squared error (MSE) and the optimal sample fraction ko 
with ko := argmin^ M5-E[77„(A:)]. 

In Figure 3, we illustrate the values of the estimated mean values and MSE's of r]n{k) in (12. 3p . for 
a sample of size n = 5000 from the ARMAX process in (|6.ip . with upcrossings index r/ = 0.5. In Table 
1 we present the main distributional properties of the estimator under study with the associated 95% 
confidence intervals (see Gomes and Oliveira [8] ). 





000 2000 3000 4000 5000 



1000 2000 3000 4000 5000 



Figure 3: Estimated mean values (left) and mean squared errors (right), for samples of size n = 5000 
from the ARMAX process in (fO) (r/ = 0.5) 
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Table 1: Optimal sample fractions, mean values and mean squared errors of the estimator at its optimal 
levels, for the ARMAX process, with t] = 0.5. 



6.2 AR(1) process 

Lets consider the negatively correlated uniform AR(1) process of Chernick et al. [2], 

Xn = --Xn-i + en, n > 1, (6.2) 
r 

where {e„}n>i is a sequence of independent and identically distributed random variables, such that, for 
a fixed integer r > 2, e„ ~ , • • • , -^^i 1} and Xq ~ [7(0, 1) independent of e„. 

Condition also holds for this stationary sequence, with u = u^''^ = {n„ = 1 — T/n}„>i, r > 0, 

moreover lim„_>+oo "'-P(-^i > Un) = t and t] = 9 = 1 — (see Sebastiao et al. [IB])- Condition 
D"{\i) typically doesn't hold for these sequences since they tend to oscillate rapidly near extremes. To 
illustrate this characteristic we present in Figure 4 sample paths of the negatively correlated uniform 
AR(1) processes for r = 2, r = 3 and r = 5 (ry = 6* = 0.75, r] = 6 = 0.89 and r] = 6 = 0.96, respectively). 



T| = 0.75 71 = 0.89 T| = 0.96 




r = 5 (right) 



We present, in Figures 5-7, the estimated mean values and MSB's of r]n{k) in (|2.3p . for a sample of size 
n = 5000 from the AR(1) process in ()6.2p . with upcrossings index r] = 0.75, 0.89 and 0.96, respectively. 

In Table 2 we present the main distributional properties of the estimator under study with the 
associated 95% confidence intervals. 

6.3 Some overall conclusions 

• The sample paths of the runs estimator of the upcrossings index have very different patterns for 
the ARMAX and the AR(1) process. Whilst for the ARMAX process the estimates increase with 
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the value of k, for the AR(1) process the estimates, as function of k, have almost a symmetric 
distribution, decreasing with smaller values of k. 

• For the ARMAX process, smaller values of r/ tend to be associated with a wider "bathtub" pattern 
of the mean squared error as a function of k. 

• The runs estimator of the upcrossings index has some mean value stability around the target r] for 
small values of k, exhibiting, for such values, the mean squared error a "bathtub" pattern, although 
not very wide. 

• For small values of k, or equivalently high levels, we obtain good estimates for rj. We remark that 
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1000 


33 


0.033 


0.95687 ±0.00054 


0.00130 ± 0.000038 


2000 


67 


0.034 


0.95842 ± 0.00026 


0.00061 ± 0.000011 
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0.035 


0.95949 ± 0.00017 


0.00023 ± 0.000004 


Table 2: Optimal sample fractions, 


mean values and 


mean squared errors 


of the estimator at its optimal 



levels, for the AR(1) process, with r] = 0.75, r] = 0.89 and r] = 0.96. 



the choice of the number of top order statistics is a complex problem in extreme value applications. 

The sensitivity of the mean value to the changes in k seem to clarify the need of studying the bias 
properties of this estimator. This is clear form Figure 6 where we can see that for the ARMAX 
process the behaviour of the mean squared error is almost determined by the bias (overlap of both 
curves), since the variance is always very small. The same behaviour holds for the AR(1) process. 



MSE 
VAR 
BIA3"2 



Figure 6: Estimated mean squared error (solid line), estimated variance (dashed line) and estimated bias 
(dotted line), for samples of size n = 5000 from the ARMAX process in (|6.1|) . 
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7 Case-studies 



7.1 Ozone pollution 

We now consider the performance of the above mentioned estimator in the analysis of n = 120 weekly 
maxima of hourly averages of ozone concentrations measured in parts per million, in the San Francisco 
bay area, San Jose. These data are available in the package Xtremes (Reiss and Thomas [16]) and have 
already been studied, for instance, in Gomes et al. [7] when estimating the extremal index. In Figure 7 
we picture the data over the above mentioned period. 




20 40 60 80 100 120 

t 



Figure 7: Weekly maxima of hourly averages of ozone concentrations measured in parts per million, in 
the San Francisco bay area, San Jose 



Since in practice condition D^^^ is not yet possible to verify we assumed that it holds since as stated 
Gomes et al. [7J, most of the parametric models that adequately fit this type of meteorological data 
satisfy condition D" and therefore also satisfy condition D^^\ 

In Figure 8 we picture the sample paths of r]n{k) as functions of A; in a linear scale (left) and logarithmic 
(right) scale. 




Figure 8: Estimates of the upcrossings index as a function of k for the weekly maxima of hourly averages 
of ozone concentrations measured in parts per million, in the San Francisco bay area, San Jose, in a linear 
scale (left) and in a logarithmic scale (right). 
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The stability around one for small values of k agrees with the fact that r] = 1 since from [7] condition 
D" holds. Nevertheless, the small number of observations makes it difficult to rely on such a point 
estimate since the number of upcrossings and consequently the number of non-upcrossings followed by 
an upcrossing, of a hight level, will always be very small. 

7.2 Financial log-returns 

Financial time series are very unlikely to satisfy condition D" because of their varying volatility. 
Therefore, we shall consider now the time series of log returns of the German stock market index DAX, 
consisting of the 30 major German companies trading on the Frankfurt Stock Exchange. The daily 
closing prices of the DAX index for the period from 1991 to 1998 that we work with is available as 
dataset EuStockMarkets in the statistics software R, which we also used for all computations. This data 
set of 1786 observations has been considered in Klar at al. [TU] where we can find a full statistical 
description. In Figure 9 we picture DAX daily closing prices over the mentioned period, xt, and the 
log-returns, 100 x (In — Inxt-i), the data to be analyzed. 




Figure 9: DAX daily closing prices (left) and daily log-returns (right) 

This time series of log returns is well modeled by a GARCH(1,1), covariance stationary, process 
(|10j). Using a goodness-of-fit test Klar at al. [10], at a 5% level, do not reject the hypothesis that the 
innovations of the process follow a t-distribution with seven degrees of freedom. 
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Figure 10: Estimates of the upcrossings index as a function of k for the DAX log-returns 

The graph in Figure 10 allows us to conclude that r] is not equal to one, which agrees with the fact 
that these processes are very unlikely to verify condition D" . The sample path exhibits a stability region 
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around a value close to ij = 0.8. 

Mikosch and Starica [ll] derive the extremal index for the squared G ARCH (1,1) processes, whereas 
Laurini and Tawn [11] propose an algorithm for the evaluation of the extremal index of G ARCH (1,1) 
processes with t— distributed innovations. It would be interesting, in future work, to obtain similar results 
for the upcrossings index of a GARCH(1,1) process. 

8 Conclusions 

The upcrossings index, as a measure of the clustering of upcrossings of high levels, is an important 
parameter when studying extreme events. For sequences satisfying condition D^^\ that locally restricts 
the dependence of the sequence but still allows clustering of upcrossings, we have proposed a simple 
estimator for this parameter. The study of the properties of the proposed estimator, namely its con- 
sistency and asymptotic normality, has been carried out in Sections 3 and 4. With simulations of well 
known autoregressive processes that verify condition D^^^ we were able to illustrate the performance of 
the estimator. Case studies in the fields of environment and finance were also exploited. 

Relation (|1.5|) allows one also to estimate rj through the extremal index 9 modified by consistent 
estimators of exceedances, r, and the mean number of upcrossings, v, of high levels. Several estimators 
for the extremal index can be found in the literature (see Ancona-Navarrete and Tawn [1] for a survey). 
Other estimators arise from Proposition 1.1 and from the relation with fii, the upcrossings-tail dependence 
coefficient. In future work we intend to propose new estimators for the upcrossings index and compare 
the several estimating methods. 
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